Simple B&B restart#1415
Conversation
Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
…an method. Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
…ogs to use the std::format variant. Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
…ly with restarts. Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
…restart Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
📝 WalkthroughWalkthroughThis pull request implements a restart mechanism for branch-and-bound MIP solving. It extracts search tree management into a dedicated header, adds restart-triggered exploration loops, integrates concurrent halt signaling across workers and heuristics, and implements worker state reset for efficient restarts without full reallocation. ChangesRestart-driven branch-and-bound exploration and integration
🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
cpp/src/mip_heuristics/diversity/lns/rins.cu (1)
217-232:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftMake the CPUFJ child task shutdown exception-safe.
fj_cpu->preemption_flagis rebound to a stack-localfj_halt_flag, thencpufj_solve()is launched as a child OpenMP task. The stop path only happens later on the normal path. If any call in between throws, the task can keep polling a dead stack atomic whilefj_cpuis being unwound. Please move the halt flag/task ownership behind an RAII guard or a task object with stable lifetime so stop/join happens on every exit path.As per coding guidelines,
**/*.{cpp,cu,hpp,cuh}: Use RAII for all resource management including exception paths in C++.Also applies to: 268-307
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cpp/src/mip_heuristics/diversity/lns/rins.cu` around lines 217 - 232, The child OpenMP task binds fj_cpu->preemption_flag to the stack-local fj_halt_flag which can dangle if an exception is thrown before the task completes; fix by giving the task a stable, RAII-managed flag and ownership so it is valid for the task's lifetime: allocate the halt flag with shared lifetime (e.g., a heap/shared_ptr<atomic<bool>> or embed it inside a RAII task object) and set fj_cpu->preemption_flag to that stable address, ensure the RAII object owns joining/stopping the task and is destroyed on all exit paths; update the code paths around fj_cpu, fj_halt_flag, and cpufj_solve to use that RAII-managed flag so preemption_flag never points to stack memory.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@cpp/src/branch_and_bound/branch_and_bound.cpp`:
- Around line 743-745: The logged/exported counter was switched to
exploration_stats_.total_nodes_explored but solve_node_deterministic() only
increments exploration_stats_.nodes_explored, so deterministic runs report zero;
update the producer-side accounting inside solve_node_deterministic() (or the
deterministic node-visit path) to also increment
exploration_stats_.total_nodes_explored (or increment both
exploration_stats_.nodes_explored and exploration_stats_.total_nodes_explored)
wherever nodes are counted, ensuring the same counter used by the log/return
paths is updated for deterministic and opportunistic solves.
- Around line 1563-1572: The debug call in branch_and_bound.cpp using
settings_.log.debug mixes printf-style format specifiers with a "{}" placeholder
so nodes_since_last_check is never printed; change the "{}" to the appropriate
printf specifier (e.g., "%d") so the format string matches the arguments
(num_nodes, current_progress, current_abs_gap, nodes_since_last_check,
progress_since_last_check, gap_reduction, tree_size_estimate) and ensure the
order of placeholders aligns with those variables.
- Around line 2982-2985: The restart checkpoint mixes internal and user-space
units: set exploration_stats_.restart_gap_at_last_check using user-space units
(call compute_user_abs_gap() or otherwise convert upper_bound_ -
get_lower_bound() into user-space) so it matches what should_restart() compares;
update the initialization in the same block (replace upper_bound_ -
get_lower_bound() with compute_user_abs_gap()) to keep units consistent with
should_restart() and compute_user_abs_gap().
In `@cpp/src/branch_and_bound/worker.hpp`:
- Around line 230-237: bfs_worker_t::reset_state() currently clears node_queue
and zeroes total_active_diving_workers but does not reset the per-strategy
active_diving_workers array, leaving stale counts; modify reset_state() to also
zero all entries of active_diving_workers (e.g., fill or loop over
active_diving_workers to set each element to 0) so per-strategy state is
consistent after restart, while keeping the existing resets for
total_max_diving_workers, total_active_diving_workers, is_active, and
lower_bound.
---
Outside diff comments:
In `@cpp/src/mip_heuristics/diversity/lns/rins.cu`:
- Around line 217-232: The child OpenMP task binds fj_cpu->preemption_flag to
the stack-local fj_halt_flag which can dangle if an exception is thrown before
the task completes; fix by giving the task a stable, RAII-managed flag and
ownership so it is valid for the task's lifetime: allocate the halt flag with
shared lifetime (e.g., a heap/shared_ptr<atomic<bool>> or embed it inside a RAII
task object) and set fj_cpu->preemption_flag to that stable address, ensure the
RAII object owns joining/stopping the task and is destroyed on all exit paths;
update the code paths around fj_cpu, fj_halt_flag, and cpufj_solve to use that
RAII-managed flag so preemption_flag never points to stack memory.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: baf4c7cc-54ed-4aa7-8328-a5f69d4d0b1f
📒 Files selected for processing (18)
cpp/src/branch_and_bound/branch_and_bound.cppcpp/src/branch_and_bound/branch_and_bound.hppcpp/src/branch_and_bound/mip_node.hppcpp/src/branch_and_bound/node_queue.hppcpp/src/branch_and_bound/pseudo_costs.cppcpp/src/branch_and_bound/pseudo_costs.hppcpp/src/branch_and_bound/search_tree.hppcpp/src/branch_and_bound/symmetry.hppcpp/src/branch_and_bound/worker.hppcpp/src/branch_and_bound/worker_pool.hppcpp/src/dual_simplex/simplex_solver_settings.hppcpp/src/dual_simplex/solve.cppcpp/src/mip_heuristics/diversity/lns/rins.cucpp/src/mip_heuristics/diversity/recombiners/sub_mip.cuhcpp/src/mip_heuristics/feasibility_jump/fj_cpu.cucpp/src/mip_heuristics/feasibility_jump/fj_cpu.cuhcpp/src/mip_heuristics/solver.cucpp/src/mip_heuristics/solver_context.cuh
Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
|
/ok to test 3037981 |
Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
This PR implements a simple restart procedure for B&B. More specifically,
Nconsecutive times, then it trigger a restart.References
[1] G. Hendel, “Adaptive solver behavior in mixed-integer programming,” Doctoral, Technische Universität Berlin, Berlin, 2022. [Online]. Available: https://depositonce.tu-berlin.de/items/d6d22f30-37b3-42ed-afb9-be9c0aaa2931
[2] "HiGHS - Linear optimization software". [Online]. Available: https://github.com/ERGO-Code/HiGHS
Results
MIPLIB2017, 10min, GH200
Overall, I see an increase of the number of optimal solutions by
3(unitcal_7,glass-sc,rococoC10-001000), but it no longer proves optimality forneos-5093327-huahum.Checklist